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Abstract 


From ~5000 deg? of the combination of the Beijing-Arizona Sky Survey and Mayall z-band Legacy Survey which 
is also the northern sky region of the Dark Energy Spectroscopic Instrument (DESI) Legacy Imaging Surveys, we 
selected a sample of 31,825 candidates of low surface brightness galaxies (LSBGs) with the mean effective surface 
brightness 24.2 < [igg < 28.8 mag arcsec ^ and the half-light radius 2/5 < rete < 20" based on the released 
photometric catalog and the machine learning model. The distribution of the LSBGs is bimodal in the g — r color, 
indicating the two distinct populations of the blue (g — r « 0.60) and red (g — r > 0.60) LSBGs. The blue LSBGs 
appear spiral, disk or irregular while the red LSBGs are spheroidal or elliptical and spatially clustered. This trend 
shows that the color has a strong correlation with galaxy morphology for LSBGs. In the spatial distribution, the 
blue LSBGs are more uniformly distributed while the red ones are highly clustered, indicating that red LSBGs 
preferentially populate a denser environment than the blue LSBGs. Besides, both populations have a consistent 
distribution of ellipticity (median e ~ 0.3), half-light radius (median r,;^- 4") and Sérsic index (median n= 1), 
implying the dominance of the full sample by the round and disk galaxies. This sample has definitely extended the 
studies of LSBGs to a regime of lower surface brightness, fainter magnitude and broader other properties than the 
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previously Sloan Digital Sky Survey-based samples. 


Key words: catalogs — galaxies: fundamental parameters — galaxies: statistics — techniques: photometric 


1. Introduction 


Low surface brightness galaxies (LSBGs) are traditionally 
defined as galaxies with the B-band central surface brightnesses 
(Uo) fainter than a threshold value within 21.65—23.0 mag arcsec 7 
(Freeman 1970; Impey & Bothun 1997; O’Neil et al. 1997; 
Zhong et al. 2008; Du et al. 2015). In addition, the po 
in some other optical or near-infrared bands such as the 
r (Courteau 1996), R (Adami et al. 2006) and Ks bands 
(Monnier Ragaigne et al. 2003) have been adopted to 
distinguish between LSBGs and high surface brightness 
galaxies (HSBGs) as well. Besides the jo, the mean surface 
brightness within effective radius (j1,4) has also been utilized 
to define LSBGs, for example, the criterion of the g-band 
[ig > 24.2-24.3 mag arcsec” was once used to select 
LSBGs in Greco et al. (2018), Tanoglidis et al. (2021b), 
allowing for the retention of nucleated galaxies in the 
sample. 

LSBGs are characterized by their diffuse, extended, low- 
density stellar disks and most of them are blue in color (de Blok 
et al. 1996; Burkholder et al. 2001; O'Neil et al. 2004; 
Trachternach et al. 2006; Vorobyov et al. 2009; Zhang et al. 
2024). In terms of morphology, they are disk-like or irregular 
(de Blok & McGaugh 1996, 1997; de Blok et al. 2001). 
Compared to HSBGs, LSBGs have different properties, 


including low star formation rates (van der Hulst et al. 1993; 
van Zee et al. 1997; van den Hoek et al. 2000; Wyder et al. 
2009; Galaz et al. 2011, 2022; Schombert et al. 2011; Lei et al. 
2018, 2019) low metallicities (de Blok & van der 
Hulst 1998a, 1998b; Kuzio de Naray et al. 2004; Du et al. 
2017), high gas fractions (Huang et al. 2014; Du et al. 2015; He 
et al. 2020), low dust content (Matthews et al. 2001; Hinz et al. 
2007; Rahman et al. 2007) and low active galactic nucleus 
(AGN) fraction (Galaz et al. 2011), which indicate that LSBGs 
are different in terms of star formation and evolutionary history 
from HSBGs. Therefore, it is vital to study LSBGs to complete 
the current paradigm of galaxy formation and evolution. 
Moreover, given that LSBGs contribute approximately 2096 
(Minchin et al. 2004) to the dynamical mass of the galaxies in 
the universe and ~30%-60% (McGaugh et al. 1995; McGaugh 
1996; Bothun et al. 1997; O'Neil et al. 2000; Trachternach 
et al. 2006; Haberzettl et al. 2007; Martin et al. 2019) to the 
number density of galaxies in the local universe, LSBGs play a 
significant role in understanding the universe. 

In the past, researches on LSBGs primarily concentrated in 
smaller regions such as massive galaxy clusters (Sabatini et al. 
2005; van Dokkum et al. 2015; Venhola et al. 2017), satellites 
of nearby galaxies (Martin et al. 2013; Cohen et al. 2018) and 
other nearby clusters. However, with the advancement of 
modern observational technology and the emergence of larger, 
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more sensitive telescopes, it has become possible to perform 
an untargeted search for LSBGs using deep and wide-field 
imaging surveys. In the recent decades, wide-field galaxy 
surveys have revealed a large number of LSBGs. For example, 
the Sloan Digital Sky Survey (SDSS; York et al. 2000) DR4 
had established a population of 12,282 face-on LSBGs by 
Zhong et al. (2008). Greco et al. (2018) discovered 781 LSBGs 
with an untargeted search in the Hyper Suprime-Cam Subaru 
Strategic Program (HSC-SSP; Aihara et al. 2018a). Recently, 
Tanoglidis et al. (2021b) constructed a large sample of 23,790 
LSBGs based on the first three years of data from the Dark 
Energy Survey (DES; The Dark Energy Survey Collabora- 
tion 2005). In addition, the imaging survey of SDSS DR7 and 
the 40% Arecibo Legacy Fast ALFA Survey (Giovanelli 2007) 
have been combined to search for samples with low optical 
surface brightnesses and abundant neutral hydrogen gas (Du 
et al. 2015; He et al. 2020). More recently, the candidates of the 
ultra-diffuse galaxies, a subset of LSBGs with g-band 
lio Z 24 mag arcsec * and effective radii rere > 1.5 kpc (van 
Dokkum et al. 2015), were selected by Zaritsky et al. 
(2022, 2023) from the Dark Energy Spectroscopic Instrument 
(DESI) Legacy Imaging Surveys (hereafter referred to as the 
Legacy Surveys; Dey et al. 2019). 

In recent years, the advent of more and more deep and wide 
imaging surveys brought unprecedented opportunities to detect 
numerous LSBGs with much fainter surface brightness than 
before. With these samples of more LSBGs with much lower 
surface brightnesses from the images at previously unreachable 
depth, the existing LSBG samples that are dominated by 
brighter LSBGs («24.0magarcsec ?) would be highly 
completed by fainter LSBGs that have much lower surface 
brightnesses, which would be definitely useful to refine or 
complete the extant conclusions that are biased toward the 
LSBGs with brighter surface brightnesses, and provide new 
constraints on galaxy formation theory and the cosmological 
models. Thanks to the increasingly widespread application of 
computer techniques in the field of astronomy (Cheng et al. 
2020), it is possible to expedite the search for LSBGs amidst 
the continuously increasing astronomical data with the help of 
the available computer techniques, such as machine learning. 
For example, Tanoglidis et al. (2021b) searched for LSBGs 
from the data of the first three years of DES observing (DES 
Y3), utilizing machine learning techniques. In other work, the 
deep learning techniques are used to identify LSBGs from the 
digital sky survey images (Zaritsky et al. 2019; Tanoglidis et al. 
2021a; Yi et al. 2022). In this paper we are inspired to obtain a 
catalog of LSBG candidates from the data from DR9 of the 
northern portion of the Legacy Surveys with the virtue offered 
by a machine learning technique. 

In this paper, we briefly describe the Legacy Surveys and the 
initial data in Section 2, and describe the initial data and the 
selection of the sample of the LSBG candidates by using 
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machine learning in Section 3. We study the properties of the 
sample of LSBG candidates in Section 4, such as the color 
distribution, spatial distribution and other properties. Finally, 
we compare the LSBG candidates with the LSBG samples 
from several previous publications in Section 5 and make a 
summary in Section 6. 


2. Data 


The DESI Legacy Surveys conducted observations of 
~14,000 deg? of extragalactic sky in three optical bands (g, 
r, z). The 5c point source depths for DR9 of the Legacy 
Surveys are about g = 24.7, r= 23.9 and z = 23.0 AB mag, 
apparently deeper than those for the SDSS images which are 
g — 23.13, r - 22.70 and z= 20.71, respectively. Therefore, 
the data from the Legacy Surveys are expected to embrace 
numerous galaxies with much lower surface brightness than the 
SDSS data. 

Additionally, the Legacy Surveys are composed of three 
imaging projects of the Beijing-Arizona Sky Survey (BASS; 
Zou et al. 2017), the Mayall z-band Legacy Survey (MzLS) and 
the Dark Energy Camera Legacy Survey (DECaLS). Specifi- 
cally, BASS has surveyed an area of ~5500 deg? which is 
dominated by the region of the sky at decl. > --32? (with only 
~4% located at decl. < +32°) in the optical g and r bands 
using the Bok 2.3 m telescope at Kitt Peak. The MzLS has 
observed nearly the same sky region as the BASS at 
decl. > --32? but in the z-band, which well provides a 
complementary band to extend the band coverage of the 
BASS. Hereafter, we refer to the survey of both BASS and 
MzLS at decl. 2 4-32? as the BASS+MzLS, and intend to 
select the LSBG candidates from the data of BASS+MzLS. 


3. LSBG Catalog 


In this section, we elaborate on the procedures that we 
followed to select the LSBG candidates from the BASS 
+MzZLS, based on the combination of the publicly available 
photometric catalog produced by the Tractor software (Lang 
et al. 2016) for DR9 and the machine learning technique. 


3.1. Initial Sample Selection 


The Tractor catalog of DR9 for the BASS-4-MzLS provides 
valuable properties for the total sample of 364,277,779 
extracted sources, including astrometry, photometry and 
geometry. Based on some crucial properties in this catalog, 
we select the LSBG candidates according to the following 
procedures step by step. 

First of all, most LSBGs are acknowledged to be dominated 
by an extended disk, so we remove sources with morphological 
types (type) of PSF, DUP or DEV from the total sample. By 
this criterion, the point sources, coincident Gaia sources, or 
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elliptical galaxies are excluded and 187,492,198 sources 
(~51.470%) of the total sample are retained. 

Second, we require the sources to have half-light radius (as 
measured via shape_r parameter in the Tractor catalog) 
Tef 2"5 to focus on the extended galaxies, and simulta- 
neously require rete < 20” to reject spurious sources or imaging 
artifacts, following Greco et al. (2018) and Tanoglidis et al. 
(2021b) where the detections larger than this scale in HSC-SSP 
or DES images were inspected to be rare and generally 
spurious. By this criterion, 2,999,940 sources (out of 
187,492,198) are retained. 

Then, to avoid the sources seriously polluted by the nearby 
contaminants which would cause unreliable model fitting 
results, we require our sources to satisfy the following criteria 
according to Ruiz-Macias et al. (2020): 


fracmaskedy < 0.5 
fracfluxy < 5 
fraciny > 0.3, (1) 


where X represents the g, r, or z bands. The fracmaskedy, 
fracfluxy and fraciny are parameters in the Tractor catalog 
which could probe the quality of the model fitting for the 
sources. Specifically, the fracmaskedy, the profile-weighted 
fraction of pixels masked from other observations of the target 
object, is used to remove sources with a high fraction of 
masked pixels. The fracfluxy, the profile-weighted fraction of 
the flux from other sources divided by the target object flux, is 
used to reject objects with heavily contaminated flux. The 
fraciny, the fraction of the flux from the target source within the 
blob, a group of pixels, is used to select sources with a large 
fraction to ensure well-constrained model fits. By this criterion, 
1,622,986 sources, approximately 0.446% of the total sample, 
are retained. 

Here before the next criterion, we correct the flux for the 
Galactic extinction and convert it to the magnitude with the 
prescription (Equation (2)) given by the Legacy Surveys 


my = 22.5 — 2.5 log (Fx) 
Foo. x = Fy /MWx 
Mecorr,X = 22.5 — 2:5 logig (oorr x); (2) 


where X represents the g, r, or z bands. Fy is the model flux in 
the X band, measured as fluxy in unit of nanomaggy in the 
Tractor catalog, MWy the Galactic transmission of the object 
position in the X filter, measured as mw transmission X in 
linear units from 0 to 1 in the catalog, where | represents a fully 
transparent region of the Milky Way and 0 a fully opaque 
region. F;o x, the Galactic-extinction corrected flux, is further 
converted to the magnitude, meor,x, based on which the colors 
of g — r and g — z are obtained. 
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Figure 1. The g — r vs. g — z diagram of the total sample of BASS+MZzLS. 
The three black contours from the inside out are enclosing 68.296, 95.696 and 
99.8% of the total sample respectively. The red box is our color box expressed 
in Equation (3), enclosing the most densely populated region of the galaxies 
while rejecting the high redshift galaxies and spurious objects. 


After that, we request the colors to be within the color box 
defined by 


—04«g-z«23 
(g—r)«0.66x(g—2) + 0.6 
(g — r) > 0.6 x (g — z) — 0.1. (3) 


This color box was empirically determined based on the 
distribution of the total sample of the BASS+MzLS in the 
8 —r versus g — z diagram, as shown in Figure 1 where the 
three black solid contours from the inside out, respectively, 
enclose 68.2%, 95.6% and 99.8% of the total sample. For 
determining the color requirements (the red box in Figure 1), 
our principles include the majority of the galaxies within the 
central contour where 68.2% of the total sample gathers while 
excluding the high redshift galaxies and spurious objects. By 
satisfying the color requirements (Equation (3)), 994,459 
sources (70.273906 of the total sample) are retained. 

Subsequently, we require the ellipticity (1 — b/a) to be less 
than 0.7 (an axis ratio, b/a, greater than 0.3) to avoid edge-on 
galaxies, some spurious objects with high ellipticity (e.g., 
diffraction spikes), or the most obvious lensed galaxies. By this 
criterion, 772,745 sources (0.212% of the total sample) are 
retained. 

Finally, we calculate the mean surface brightness within the 
half-light radius, fig y, by using Equation (4) 


For 
Bex = 22.5 — 2.5 es (2e: (4) 
nse 
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Table 1 
LSBG Selection Parameters 
LSBG 
Criterion Range Candidates Percent 
No cut NA 364,277,779 100.000% 
Type ! = PSF, 187,492,198 51.470% 
DEV, DUP 
Test 25-20" 2,999,940 0.824% 
fracmasked, fracflux, Equation (1) 1,622,986 0.446% 
fracin 
Color Equation (3) 994,459 0.273% 
Ellipticity <0.7 712,145 0.21296 
Fest g (mag arcsec ? ) 24.2-28.8 344,370 0.095% 
Machine Learning E 57,934 0.016% 
Visual Inspection ooo 31,825 0.009% 


where r, is the half-light radius, measured as shape_r in the 
Tractor catalog. We require the Ñep to be within 
24.2 < flere g < 28.8 mag arcsec ? and obtain 344,370 objects 
(0.095% of the total sample) as the initial sample of the LSBG 
candidates. 

For a clear picture of the process of our selection for the 
initial LSBG candidates so far, the selection criteria above are 
listed in Table 1. Up to now, the selection of the initial LSBG 
candidates was solely via the direct use of the Tractor catalog, 
so we furthermore inspected the images of a few thousand 
initial LSBG candidates and found a large number of the 
candidates were apparently false LSBGs that were instead the 
sources of contaminations. So, it is necessary to reject those 
false LSBG candidates from the numerous initial candidates via 
machine learning techniques. 


3.2. Machine Learning Classification 


From our visual inspection, the most common sources of 
contaminations for the false LSBGs were: 


1. Red objects with high ellipticity close to the criterion of 
0.7 (e.g., Figure 2(a)). 

2. Detections that are almost invisible in the images (e.g., 
Figure 2(b)). 

3. Diffuse light from the nearby bright stars (e.g. 
Figure 2(c) and (f)). 

4. Faint, diffuse regions of objects in a larger scale, such as 
Galactic cirrus (e.g., Figure 2(d)). 

5. Diffuse light from the arms of large spiral galaxies (e.g., 
Figure 2(e)). 


Aiming to reject the false LSBGs from the initial sample of 
the LSBG candidates and simultaneously maintain a complete- 
ness of the true LSBGs as high as possible, we employed a 
supervised machine learning classification algorithm. 
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3.2.1. Training and Test Sets 


In order to prepare for a labeled sample with objects labeled 
as either true or false LSBGs for the training and the test in 
machine learning, we decided to visually inspect the images of 
all of the 22,710 initial LSBG candidates within the 26 sky 
areas (blue areas in Figure 3) that were selected by us to 
distribute uniformly in the spatial area of the BASS--MzLS. To 
alleviate the subjective biases, we had three individuals to 
perform the visual inspections independently to identify each 
candidate to be a true or a false LSBG. Then, the results from 
the three were combined as the final results. Ultimately, we 
labeled the 2561 candidates identified as the true LSBGs by 
more than two individuals as LSBGs and labeled the remaining 
20,149 as non-LSBGs. Then, 70% of the labeled sample of 
22,710 labeled objects was adopted as a training set while the 
other 30% of the labeled sample was utilized as a test set. We 
used the training set to train a model and evaluated the quality 
of the trained model using the test set. 


3.2.2. Model, Features and Classification 


Before training the model, it is necessary to select a machine 
learning algorithm. We tested and evaluated the widely used 
algorithms of Random Forest (via the Python library SCIKIT- 
LEARN; Pedregosa et al. 2011), XGBoost (Chen & Guest- 
rin 2016; via the Python library XGBOOST), Naive Bayes, 
AdaBoost, K Nearest Neighbors, Decision Tree, Support 
Vector Machine and SVM with radial basis function kernel 
(via an automated toolkit AUTO-SKLEARN that integrates 
diverse machine learning algorithms; Feurer et al. 2015). 
Among these models, we selected XGBoost which stood out 
with the highest accuracy on the test set to be our machine 
learning model in this study. 

Asides from the model, we need to opt for the useful features 
for learning. We performed tests and assessments for the 
quality of different feature combinations for learning by using 
the control variable method. If the accuracy of the model takes 
the first priority, we believe that it is best to use all of the 
following 24 features in learning, which are listed in their order 
of importance. 


. The ellipticity of objects, 1 — b/a. 

. The half-light radius, shape. r. 

3. The colors of g — r, g — z and r — z derived from the 
Galactic extinction corrected magnitudes. 

4. The Galactic extinction corrected magnitudes in the g, r 
and z bands, mag corr. 

5. The profile-weighted fraction of the flux from other 
sources divided by the total flux in the g, r and z bands, 
fracflux. 

6. The fraction of a source's flux within the detection in the 

g, r and z bands, fracin. 


Noe 


Research in Astronomy and Astrophysics, 24:055015 (13pp), 2024 May 


(a) (b) (c) (d) (e) (f) 


Du et al. 


Figure 2. The composite images of the g, r and z bands from the Legacy Surveys DR9 for the common sources of contamination in the initial LSBG candidates. The 
size is 1! 1 x 1/ 1 for all of the panels except for panel (d) which is 3! 84 x 3! 84. The false LSBG candidates are at the center of each panel. 
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Figure 3. The 26 areas (blue) selected to generate a labeled set for training and test by the machine learning. The black solid outline encloses the entire sky area of the 


BASS+MzLS. 


7. The profile-weighted fraction of pixels masked from all 
observations of this object in the g, r and z bands, 
fracmasked. 

8. The mean effective surface brightness in the g, r and z 
bands, mu_mean. 

9. The power-law index for the Sérsic profile model, 
measured as sersic in the Tractor catalog. 

10. The central surface brightness in the g, r and z bands, 
mu 0, which is converted from mu_mean by the 
transforming prescription provided in Graham & Dri- 
ver (2005). 


As for the training, our principle was to obtain a model with 
the maximum value for the Recall parameter to make sure that 
the true LSBGs in the training sample could be retained in 
positive predictions as completely as possible while maintaining 
the Precision parameter (the proportion of true LSBGs in the 
predicted LSBGs) as high as possible. To evaluate the model at a 


balance between the Recall and Precision, the Fbeta-measure 
criterion was introduced as an evaluation metric, which 
represents the weighted harmonic mean of both the Precision 
and the Recall. In our principle, the Recall parameter should have 
a greater weight than the Precision, so we use beta—2, a 
commonly used value, as the standard for the Fbeta-measure 
in model evaluation. With these guidelines, we trained the 
XGBoost model by using grid search and OPTUNA, a 
hyperparameter optimization framework (Akiba et al. 2019), to 
optimize the hyperparameters of the XGBoost model. After 
thousands of optimizations, we finally derived the trained 
XGBoost model with the optimized hyperparameters, such as 
max depth —6, | n estimators = 337, | learning rate ~ 0.09, 
subsample ~ 0.393, scale pos weight = 8 and so on. 
Subsequently, this XGBoost model was applied to the test 
set, and the results from the test set were displayed in the 
confusion matrix (Figure 4). Obviously, the Recall value, 
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Figure 4. The confusion matrix of our XGBoost classifier evaluated on the test 
set. The quoted numbers correspond to the number of the test objects based on 
their true and predicted labels. The Recall is ~92.5%. 


defined as the ratio of the true LSBGs classified as LSBGs 
(Recall = TP/(FN--TP)) by the model, is ~92.5%. For the 
minor fraction of the true LSBGs that were classified as non- 
LSBGs and the false LSBGs that were classified as LSBGs by 
the model, we visually inspected their images and found that 
they are too dark to result in a reliable classification. In 
addition, the Precision value, defined as the fraction of 
predicted LSBGs classified as true LSBGs (Precision — TP/ 
(FP+TP)), is ~61.7%, meaning that approximately 40% of the 
objects in the LSBG candidates we obtained after machine 
learning are non-LSBGs. We validated this probability in 
Section 3.3. 

With the help of the machine learning, the number of the 
initial LSBG candidates was decreased from 344,370 to 
57,934. However, according to the Precision of the model, 
the 57,934 LSBG candidates are expected to still contain 
~40% non-LSBGs, so we will perform the visual inspection of 
the images of the 57,934 candidates again to purify the sample 
in next section. 


3.3. Visual Inspection 


In this section, we visually inspected the grz-composite 
images of the 57,934 LSBG candidates retained after the 
machine learning. From the inspection, we found that there are 
still false LSBGs in the sample whose visual appearances in the 
images were not like the true LSBGs at all, but the values of 
their main features listed in Section 3.2.2 given by the Tractor 
measurements followed the true LSBGs, making it challenging 
to classify them to be non-LSBGs by our model that were 
trained solely on learning the main features since we desired a 
fast learning and classification of the LSBGs in this work. 
However, in the future, we plan to train a better deep learning 
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Figure 5. The diagram of g — r vs. g — z for the final sample LSBG candidates 
(green) and the contours from the kernel density estimates (black isochrones). 
The top and the right panels show the histograms of g — r and g — z, 
respectively. In the top panel, the g — r distribution is best fitted by the sum 
(black solid profile) of the two single Gaussian profiles (blue and red solid 
curves). By comparison, the gray dashed curve represents the fitted single 
Gaussian profile which is abandoned according to the evaluation by AIC/BIC. 
The vertical black dashed line at g — r — 0.60 is the dividing line to distinguish 
red (g — r > 0.60) from blue (g — r < 0.60) LSBG candidates. 


classification model using both the features and images of the 
final LSBG sample selected in this work. 

Specifically, these false LSBGs in the current sample still 
appeared to be like the contaminations shown in Figure 2. 
Therefore, we rejected them by visual inspection and ultimately 
resulted in a final sample of 31,825 LSBG candidates with a 
high purity of the true LSBGs with the half-light radius 
2"5 — rar « 20" and the Galactic extinction-corrected mean 
effective surface brightness 24.2 < ji, < 28.8 mag arcsec”. 
This final sample is so far the largest catalog of LSBG 
candidates from the ~5500 deg? sky area of BASS+MzLS, 
more than one-third of the entire sky area of the DR9 of the 
DESI Legacy Survey. 


4. LSBG Properties 


We successfully established a sample of 31,825 LSBG 
candidates from the BASS+MzLS, spanning a wide range of 
properties, such as the color, morphology and environment, 
which will be studied in detail in this section. 


4.1. Color Distribution 


We display the distribution of the final sample of the LSBG 
candidates in the color-color diagram of g — r versus g — z in 
Figure 5. The sample galaxies (green dots) exhibit a bimodal 
distribution in the g — r color which naturally requires a fitting 
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Figure 6. The grz-composite images from the DESI Legacy Survey DR9 for several example LSBG candidates from the blue (left) and red (right) subsamples. The 


frame size for each LSBG candidate is 1/1 x 1/1. 


by a combination of double Gaussian profiles rather than a 
single Gaussian profile according to our evaluation by the 
Akaike Information Criterion (AIC/AICc) and the Bayesian 
Information Criterion (BIC; see details in Section 5.1). 

In Figure 5 (the top panel), the best-fitting profile (black 
solid curve) evaluated by AIC/BIC is the sum of a blue 
component represented by a single Gaussian profile with a peak 
value at a blue g — r color of 0.455 and o of 0.103 (blue solid 
curve) and a red component represented by a single Gaussian 
profile with a peak value at a red g — r color of 0.700 and c of 
0.070 (red solid curve). Obviously, the blue component is 
dominated by the blue LSBG candidates of which 97.896 are 
bluer than g — r « 0.66 while the red component is dominated 
by the red LSBG candidates of which 97.896 are redder than 
g — r > 0.56. This means that galaxies between g — r = 0. 56 
and 0.66 are a mixture of LSBG candidates from the red end of 
the blue component (g — r > 0.56) and those from the blue end 
of the red component (g — r « 0.66). Since the median color of 
all of the galaxies between 0. 56 and 0.66 is 0.60 in g — r, we 
adopt g — r — 0.60 as the color dividing line (vertical black 
dashed line) to separate the final sample of LSBG candidates 
into two subsamples of the blue (g — r « 0.60; 26,672 galaxies) 
and the red (g — r > 0.60; 5153 galaxies). The median g — r 
colors of the blue and red subsamples are 0.44 and 0.67, 
respectively. 


In Figure 6, we show randomly selected LSBG candidates 
from the blue (the left) and the red (the right) subsamples as 
examples. Apparently, the blue LSBG candidates appear disk- 
like, spiral or irregular while the red ones tend to be spheroidal 
or elliptical. The former is quite distinguished from the latter in 
terms of morphology, implying that the colors of LSBGs 
correlate with their morphologies. Such a conclusion was also 
supported by several previously published studies, which will 
be discussed in Section 5.2. 


4.2. Magnitude and Surface Brightness 


In Figure 7(a) the distributions of the magnitudes in the g-, r- 
and z-band are shown for the entire final sample of LSBG 
candidates. In Figure 7(b), the distribution of the g-band 
magnitude is compared between the blue and red subsamples, 
affirming that the blue cases are slightly brighter than the red in 
the apparent magnitude in g-band. In Figure 7(c), we show the 
distribution of the mean surface brightness ji, for the blue 
and red subsamples, respectively. We find that the red 
subsample shows a bump or an excess at the lower surface 
brightness tail (fainter than 25.5 g mag arcsec ?) while the blue 
subsample has slightly more LSBGs with higher surface 
brightness (brighter than 25.5 g mag arcsec ?), implying that 
the red LSBGs from our sample are inclined to have lower 
surface brightness while the blue ones tend to have higher 
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Figure 7. The distribution of g-, r- and z-band magnitudes of the final sample of LSBG candidates (a). The g-band magnitude (b) and mean surface brightness ji; ,(c) 


are displayed for the blue and the red subsamples, respectively. 
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Figure 8. The distributions of the ellipticity of the final sample of LSBGs are displayed for the blue and the red subsamples, respectively, in panel (a), where it shows 
the majority of the LSBGs have zero ellipticity from the Tractor catalog. In panel (b), we exclude those galaxies with zero ellipticity only to give a clear picture of the 


distribution for the galaxies with non-zero ellipticity. 


surface brightness. This could be further supported by the 
statistics that the 16th, 50th and 84th percentiles of ji, are 
24.4, 24.7, 25.5 mag arcsec” for the blue subsample and 24.4, 
24.8, 25.8 mag arcsec ? for the red subsample respectively. 


4.3. Ellipticity, Effective Radius and Sérsic Index 


In Figure 8(a), we present the distribution of ellipticity 
(e= 1 — b/a) for the full final sample. It shows that both the 
blue and red subsamples have considerable fractions of 
galaxies with the zero e from the Tractor catalog (10% of the 
blue and 1896 of the red). The median e is —0.31 for the full 
sample, ~0.32 for the blue and ~0.28 for the red. To give a 
clear picture of the e distribution for those galaxies without the 
Zero e, we plot a zoom-in picture for them in panel (b), where 
both subsamples show generally consistent distributions, with 
the median e of 0.34 for the blue and 0.31 for the red. All of 
these e distributions demonstrate that the LSBG candidates in 
our final sample are obviously round between e = 0.1 and 0.7, 
which differ from the normal spiral galaxies showing a nearly 
flat ellipticity distribution between c = 0.1 and 0.7 (Figure 4 in 
Rodríguez & Padilla 2013). 


In Figure 9(a), both subsamples are dominated (more than 
99%) by galaxies with sizes ranging from 2"5 to 14" in Ferg, 
with the medians being 3^5 for the red sample, 4"1 for the blue 
sample and 4" for the full sample. It is worth noting that the retf 
measurements from the Tractor catalog for the minority of the 
large spiral galaxies are all given to be around ~13”8, causing 
a low peak to occur at resp of ~13”8 in the figure. This low 
peak due to the limitation of the Tractor model measurements 
has no physical implications, but the galaxies in this low peak 
all appear to be blue, large, diffuse, and extended disk LSBGs 
from our visual inspection. Thus, we still kept these galaxies in 
our final sample. In Figure 9(b), we plot the Sérsic index n for 
the blue and the red subsamples, showing that 95% of the blue 
subsample have n « 2.5 while 9396 of the red subsample have 
n « 2.5. The distribution of n agrees with each other for both 
subsamples, with a median of n= 1 for each, demonstrating 
that our final sample is dominated by the disk LSBGs. 


4.4. Spatial Distribution 


In Figure 10 we show the spatial distribution of the blue 
(top) and the red (bottom) subsamples over the sky area within 
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Figure 9. The distribution of the half-light radius reff (left) and the Sérsic index n (right) for the blue and the red subsamples. 


the BASS+MzLS footprint (the black solid). We find an 
obvious discrepancy between the spatial distribution of the two 
subsamples. The blue LSBGs are more uniformly distributed 
while the red populations are clustered, showing that red 
LSBGs preferentially inhabit denser environments than blue 
LSBGs. This is found by the studies of Greco et al. (2018) and 
Tanoglidis et al. (2021b) as well, and we will discuss it in 
Section 5.2. 


5. Discussion 
5.1. Double or Single Gaussian Fitting? 


In Section 4.1 our LSBG sample was reported to have a 
bimodal color distribution that could be best fitted by a mixture 
of double Gaussian models rather than a single Gaussian 
model. Such a statement is supported by the evaluation of the 
performance of the single Gaussian model (SGM; gray dashed 
line in the top panel of Figure 5) and the double Gaussian 
model (DGM; black line in the top panel of Figure 5) fit 
according to the AIC/BIC values expressed in the equations 
below (Equation (5)) 


AIC = 2k — 21In£ 
AICc = AIC + (2k? + 2k)/(n — k — 1) 
BIC = In(n)k — 21n£, (5) 


where k is the number of fitting parameters, Ê is the likelihood 
function and n is the number of samples. When the sample is 
small in size, AIC should be corrected to AICc. According to 
Kass & Raftery (1995), the performance of the model improves 
as the AICc or BIC value decreases. 

We derive the BIC or AICc values from fitting the g — r 
color distribution of our sample with a single Gaussian model 
as BICggm or AlCcggm. Similarly, we derive BICpGm or 
AlICcpaw for the fit with the double Gaussian model. Then, the 
BIC or AICc differences between the SGM and DGM are 
calculated as ABIC=BICsgm—BICpgm and AAICc = 
AlCcsgm — AlCcpaw. According to Kass & Raftery (1995), 


if ABIC or AAICc is larger than 10, the DGM would prevail. 
In our calculation, the AAICc and ABIC values are 235.1 and 
227.1, respectively, which are far greater than 10, giving strong 
evidence for us to believe that the g — r color distribution is 
much better fitted by a double Gaussian model than a single 
Gaussian model. This strongly convinces us of a bimodal g — r 
color distribution in the final sample of LSBGs. Additionally, 
such bimodal distributions of the colors of the LSBGs have 
also been reported for the previously defined sample of LSBGs 
from Greco et al. (2018) and Tanoglidis et al. (2021b), which 
will be discussed in detail in Section 5.2. 


5.2. Comparison with Previous Samples 


In this section, we compare our sample of the LSBG 
candidates with three other LSBG samples from Du et al. 
(2015) (D15), Greco et al. (2018) (G18) and Tanoglidis et al. 
(2021b) (T21), respectively. The D15 provides a sample of 
1129 LSBGs selected from the 2800 deg? area of the a.40— 
SDSS DR7 survey with an imaging depth of r~ 22.2 mag for 
point sources having 9596 detection (York et al. 2000). This 
sample is defined on the central surface brightness 
Lop > 22.5 mag arcsec ^, and they are nearby (z « 0.06), blue, 
Hrrich and disk-dominated. G18 present a sample of 781 
extended LSBGs from the first ~200 deg? area of the imaging 
survey of the Wide layer of the HSC-SSP which has a depth of 
g~ 26.8, r~ 26.4 and i~ 26.4 mag for point sources at 5c 
(Aihara et al. 2018b). This sample is defined on the mean 
surface brightness (pg, 24.3 mag arcsec *) to allow 
nucleated galaxies into the sample and on galaxy size 
(ret 2"5) as well to be restricted to low redshift. Using 
similar selection criteria to G18, T21 produce a catalog of 
23,790 extended LSBGs from the ~5000 deg? area of the first 
three years of imaging data from the DES (DES Y3) with a 
depth of g ~ 23.52, r~ 23.10 and i~ 22.51 mag for point 
sources at 10c which correspond to a surface brightness limit at 
3c of g ~ 28.26120, r-.27.86*019 and i~ 27.37*019 
mag arcsec ^. 


Research in Astronomy and Astrophysics, 24:055015 (13pp), 2024 May 


Du et al. 


80° 


FO? sies 


DEC [deg] 


50° evn 


409 Lm 


30° 


= BASS+MzLS footprint 
Blue LSBGs 


100° 130° 160° 


220° 250° 280° 


80? Lu 


70° 


DEC [deg] 


ua 
o 


A09 pasina 


30° 


=== BASS-- MzLS footprint 
Red LSBGs 


100° 130° 160° 


190° 


220° 250° 280° 


RA [deg] 


(b) 


Figure 10. The spatial distributions of the blue (top) and the red (bottom) subsamples of LSBGs within the footprint of BASS--MzLS survey (black solid). 


In terms of the surface brightness (Figure 11(a)), our sample 
is highly consistent with T21, ranging from 24.2« 
lg, < 28.8 mag arcsec *. The 16th, 50th and 84th percentiles 
Of ji, are 24.4, 24.7 and 25.5 mag arcsec ^ for our sample 
and 24.3, 24.7 and 25.3 mag arcsec * for T21 respectively. For 
the G18 sample, the /i,¢,, measurement is not available in its 
released catalog, so we are not able to display the G18 sample 
overplotted in Figure | 1(a) to carry out direct comparisons with 
the three other samples. However, it is clearly stated in G18 
that the /1,¢-, distribution of their sample is broad with the 16th, 
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50th and 84th percentiles being jg, = 24.5 (24.8), 24.8 
(25.8) and 25.5 (26.8) mag arcsec ^ for the blue (red) 
subsamples respectively. According to such statements, we 
believe that the G18 sample has quite a similar distribution of 
mean surface brightness to our sample and T21. In a stark 
contrast, the D15 sample has the mean surface brightness 
distribution with the 16th, 50th, and 84th percentiles of 
lett, = 23.4, 23.7, and 24.6 mag arcsec ~ respectively, which 
are much brighter than our three other samples. This is 
reasonable because the D15 sample is from the SDSS imaging 
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Figure 11. Comparisons of our sample (red) with the three LSBG samples from Du et al. (2015) (blue), Greco et al. (2018) (orange) and Tanoglidis et al. (2021b) 
(green) in terms of the mean surface brightness [1,5 (left), g-band apparent magnitude (middle) and the g — r color distribution (right). 


survey which has a shallower depth than the BASSJ-MzLS, 
DES Y3 and HSC-SSP surveys that our sample, T21 and G18 
are, respectively, based on. This could be furthermore 
supported by the comparison of magnitude (Figure 11(b)), 
where our sample, T21 and G18 are systematically at least 
~2 mag fainter than the D15 sample in the g-band apparent 
magnitude. 

In the aspect of the color (Figure 11(c)), the 16th, 50th and 
84th percentiles of g — r are 0.36, 0.47 and 0.60 for our sample, 
0.29, 0.43 and 0.60 for the G18 sample, 0.26, 0.38 and 0.57 for 
the T21 sample, and 0.20, 0.30 and 0.41 for D15 respectively. 
Apparently, our sample generally agrees with G18 and T21 in 
the g — r distribution, albeit the latter two samples are slightly 
bluer. Among the samples for comparison, the sample of D15 
is the bluest because their galaxies are H I-rich and dominated 
by blue LSBGs. Additionally, we reported that our sample has 
a bimodal distribution of the g —r color in Section 4.1, 
implying two distinct populations of the blue and the red 
LSBGs, respectively. Actually, such bimodal distributions of 
the color have also been found in G18 and T21 for their own 
LSBG samples. Specifically, the G18 sample shows a clear 
bimodality in both the g — r and g — i colors, and is thus 
divided into two populations of the red and the blue LSBGs 
using the median g — i — 0.64 as the dividing line. Similarly, 
the T21 sample also displays a bimodal distribution in both the 
g—r and g-—i colors, and is then separated into two 
subsamples of the blue and the red LSBGs using the 
intersection of the two Gaussian model profiles at 
8 —i1— 0.60 as the threshold. The color distributions of all 
three, including our sample, G18 and T21 demonstrate that 
LSBGs, similar to the galaxies with normal/high surface 
brightness (normal galaxies), are able to be conventionally 
divided into two sequences of the blue and the red, with the 
blue LSBGs dominated by the spiral, disk or irregular systems 
in terms of morphology and the red LSBGs by spheroidal or 
elliptical morphology. 
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As for the environments, the blue and H I-xrich LSBGs 
of D15 are mostly in voids or to the edge of the filaments with 
low densities. For the three composed of our sample, G18 
and T21, the LSBGs show consistent spatial distributions, with 
the blue LSBG populations of each sample more uniformly 
distributed within the sky footprint and the red populations of 
each sample highly clustered in the spatial area. This implies 
that the red LSBGs preferentially inhabit denser environments 
than the blue LSBGs. 

Furthermore, our sample is consistent with G18 and T21 in 
the ellipticity distribution, with the median being around 
€ ^» 0.3, showing that LSBGs of the three samples are generally 
round. This is a striking contrast to the almost flat distribution 
of e of the normal galaxies between e = 0 and 0.7. 

These comparisons strongly demonstrate that our sample 
along with G18 and T21 has well extended the SDSS-based 
LSBG samples to a new regime with much lower surface 
brightness, fainter apparent magnitude and broad properties on 
a large scale. 


5.3. Possible Evolution from the Blue to Red LSBGs? 


The optical colors of galaxies indicate their stellar popula- 
tions and have a strong correlation with the galaxy morphology 
and environment. In the frame of galaxies with normal or high 
surface brightnesses, galaxies in the local universe fall into one 
of two distinct populations in terms of optical colors: a red 
sequence and a blue cloud (Strateva et al. 2001; Baldry et al. 
2004; Blanton & Moustakas 2009). Besides the color, bimodal 
distributions have also been observed and measured in some 
other parameters, such as metallicity and star formation rate 
(Kauffmann et al. 2003a, 2003b). The blue cloud is dominated 
by active, star-forming galaxies while the red sequence is 
composed of quiescent galaxies. Compared to the blue galaxies 
which are spiral, disk or irregular systems in morphology, the 
red galaxies are ellipticals, spheroidals, lenticulars or cD 
galaxies. (Blanton & Moustakas 2009). Moreover, red galaxies 
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are more likely to be found in denser environments and more 
spatially clustered than blue galaxies (Blanton & Mousta- 
kas 2009; Das & Pandey 2024). It is proposed that the blue 
galaxies would evolve onto the red sequence by fading their 
stellar populations after their star formation ceased by some 
quenching mechanisms, such as the natural exhaustion of gas, 
AGN feedback, galaxy harassment, galaxy mergers, etc. 
Similar to the blue cloud and red sequence in the frame of 
galaxies with normal surface brightnesses, our LSBGs in this 
work show a bimodal distribution in the optical color, so they 
fall into two populations in terms of the g — r color: the blue 
and red LSBGs. In terms of morphology, the blue LSBGs are 
disk-like or irregular while the red LSBGs are more bulge- 
dominated or spheroidal. In addition, the red LSBGs are more 
spatially clustered than the blue LSBGs. So, there might be a 
possible evolutionary path from the blue LSBGs to the red 
LSBGs, and we will investigate this issue in our future work. 


6. Summary and Conclusions 


Based on the released photometric catalog from the Tractor 
software and the machine learning model, we selected a sample 
of 31,825 LSBG candidates with mean surface brightness 
24.2 < [g, < 28.8 mag arcsec ^ and  halfdight radius 
2!5 < r.c < 20" from the ~5500 deg? of the BASS+MzLS 
survey. The selection criteria are summarized in Table 1. 

This sample shows a bimodal distribution in the g — r color, 
implying two distinct populations of the blue (g — r « 0.60) 
and red (g— r> 0.60) LSBGs. The blue populations are 
dominated by spiral, disk or irregular systems while the red 
ones appear spheroidal or elliptical in morphology, revealing 
that the colors of LSBGs correlate with morphology. In terms 
of apparent magnitude and surface brightness, the red LSBGs 
are slightly fainter than the blue. Both populations have similar 
distribution of ellipticity, half-light radius (median rete 4") 
and Sérsic index (median n = 1). In terms of ellipticity, the e€ 
for both populations ranges from 0 to 0.7 with a median ~0.3, 
indicating that the sample galaxies are generally round. This 
differs from the normal spiral galaxies which show a nearly flat 
distribution between «=O and 0.7. The half-light radii are 
within ~2”5-14”, with a median r,; ^ 4". For Sérsic index, 
the blue and the red LSBG populations are both dominated by 
disk galaxies with n — 1. However, the two populations differ 
in their spatial distributions, with the blue LSBGs more 
uniformly distributed across the sky area while the red ones are 
highly clustered. This sample would absolutely be important 
for further studies on the possible evolutionary link between the 
two LSBG populations. 

By comparing our sample with three other samples of 
LSBGs, it is strongly demonstrated that our sample of LSBG 
candidates well extends the studies of LSBGs to the regime of 
lower surface brightness, fainter magnitude and broader 
properties than the previous SDSS-based LSBG samples. This 
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sample is definitely an excellent sample for training the deep 
learning model with higher performance to automatically 
identify LSBGs from the huge data from more wide and deep 
imaging surveys in the future. 
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